259 research outputs found

    Generating Visual Representations for Zero-Shot Classification

    Full text link
    This paper addresses the task of learning an image clas-sifier when some categories are defined by semantic descriptions only (e.g. visual attributes) while the others are defined by exemplar images as well. This task is often referred to as the Zero-Shot classification task (ZSC). Most of the previous methods rely on learning a common embedding space allowing to compare visual features of unknown categories with semantic descriptions. This paper argues that these approaches are limited as i) efficient discrimi-native classifiers can't be used ii) classification tasks with seen and unseen categories (Generalized Zero-Shot Classification or GZSC) can't be addressed efficiently. In contrast , this paper suggests to address ZSC and GZSC by i) learning a conditional generator using seen classes ii) generate artificial training examples for the categories without exemplars. ZSC is then turned into a standard supervised learning problem. Experiments with 4 generative models and 5 datasets experimentally validate the approach, giving state-of-the-art results on both ZSC and GZSC

    Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classification

    Full text link
    This paper addresses the task of zero-shot image classification. The key contribution of the proposed approach is to control the semantic embedding of images -- one of the main ingredients of zero-shot learning -- by formulating it as a metric learning problem. The optimized empirical criterion associates two types of sub-task constraints: metric discriminating capacity and accurate attribute prediction. This results in a novel expression of zero-shot learning not requiring the notion of class in the training phase: only pairs of image/attributes, augmented with a consistency indicator, are given as ground truth. At test time, the learned model can predict the consistency of a test image with a given set of attributes , allowing flexible ways to produce recognition inferences. Despite its simplicity, the proposed approach gives state-of-the-art results on four challenging datasets used for zero-shot recognition evaluation.Comment: in ECCV 2016, Oct 2016, amsterdam, Netherlands. 201

    Un algorithme efficace de suivi d'objets dans des séquences d'images

    Get PDF
    National audienceNous proposons dans cet article une approche permettant de suivre efficacement et rapidement le déplacement d'un motif visuel dans une séquence d'images. Cette technique consiste d'une part, en une étape hors ligne dédiée à l'apprentissage d'une matrice d'interaction liant la déformation du motif à son déplacement dans l'image, et d'autre part à l'exploitation en ligne de cette matrice pour suivre l'évolution du motif choisi. Cette seconde étape itérative consiste à prédire la position dans l'image du motif (en position, échelle et orientation), à calculer la différence du motif observé à l'endroit prédit avec le motif de référence, puis à réaliser le produit entre la matrice d'interaction et cette différence pour obtenir un vecteur correctif sur la position prédite. Nous montrons que cette étape de correction correspond à un co^ut algorithmique très faible permettant une mise en oeuvre en temps réel vidéo. Dans la partie expérimentale, nous appliquons successivement ce principe au suivi d'un motif texturé dans une séquence d'images, puis au suivi d'objets volumiques (dans ce cas le motif de référence évolue dans le temps en fonction de l'orientation relative objet/caméra). De nombreux résultats expérimentaux sont présentés et commentés

    The Many Moods of Emotion

    Full text link
    This paper presents a novel approach to the facial expression generation problem. Building upon the assumption of the psychological community that emotion is intrinsically continuous, we first design our own continuous emotion representation with a 3-dimensional latent space issued from a neural network trained on discrete emotion classification. The so-obtained representation can be used to annotate large in the wild datasets and later used to trained a Generative Adversarial Network. We first show that our model is able to map back to discrete emotion classes with a objectively and subjectively better quality of the images than usual discrete approaches. But also that we are able to pave the larger space of possible facial expressions, generating the many moods of emotion. Moreover, two axis in this space may be found to generate similar expression changes as in traditional continuous representations such as arousal-valence. Finally we show from visual interpretation, that the third remaining dimension is highly related to the well-known dominance dimension from psychology

    Real Time 3D Template Matching

    Get PDF
    International audienceOne of the most popular methods to extract useful information from an image sequence is the template matching approach. In this well known method the tracking of a certain feature or target over time is based on the comparison of the content of each image with a sample template. We propose a 3D template matching algorithm that is able to track targets corresponding to the projection of 3D surfaces. With only a few hundred subtractions and multiplications per frame, our algorithm provides, in real time, an estimation of the 3D surface pose. The key idea is to compute the difference between the current image content and the visual aspect of the target under the predicted spatial attitude. This difference image is converted into corrections on the 3D location parameters

    Local Higher-Order Statistics (LHS) describing images with statistics of local non-binarized pixel patterns

    Get PDF
    Accepted for publication in International Journal of Computer Vision and Image Understanding (CVIU)International audienceWe propose a new image representation for texture categorization and facial analysis, relying on the use of higher-order local differential statistics as features. It has been recently shown that small local pixel pattern distributions can be highly discriminative while being extremely efficient to compute, which is in contrast to the models based on the global structure of images. Motivated by such works, we propose to use higher-order statistics of local non-binarized pixel patterns for the image description. The proposed model does not require either (i) user specified quantization of the space (of pixel patterns) or (ii) any heuristics for discarding low occupancy volumes of the space. We propose to use a data driven soft quantization of the space, with parametric mixture models, combined with higher-order statistics, based on Fisher scores. We demonstrate that this leads to a more expressive representation which, when combined with discriminatively learned classifiers and metrics, achieves state-of-the-art performance on challenging texture and facial analysis datasets, in low complexity setup. Further, it is complementary to higher complexity features and when combined with them improves performance

    CAKE: Compact and Accurate K-dimensional representation of Emotion

    Get PDF
    Numerous models describing the human emotional states have been built by the psychology community. Alongside, Deep Neural Networks (DNN) are reaching excellent performances and are becoming interesting features extraction tools in many computer vision tasks.Inspired by works from the psychology community, we first study the link between the compact two-dimensional representation of the emotion known as arousal-valence, and discrete emotion classes (e.g. anger, happiness, sadness, etc.) used in the computer vision community. It enables to assess the benefits -- in terms of discrete emotion inference -- of adding an extra dimension to arousal-valence (usually named dominance). Building on these observations, we propose CAKE, a 3-dimensional representation of emotion learned in a multi-domain fashion, achieving accurate emotion recognition on several public datasets. Moreover, we visualize how emotions boundaries are organized inside DNN representations and show that DNNs are implicitly learning arousal-valence-like descriptions of emotions. Finally, we use the CAKE representation to compare the quality of the annotations of different public datasets

    Autocalibration itérative de caméras à partir de structures planes

    Get PDF
    National audienceIn this paper we present an iterative algorithm for camera autocalibration from five or more views of an unknow planar scene. Although our method work provided five views, we aim the particular case of "real time video" autocalibration problems which involves an important number of images. We used a stratified approch to compute a metric reconstruction of camera and scene structure. In a first step, we compute a projective reconstruction of scene structure using planar homography. This intermediate reconstruction is used in a second time in a Kalman Filter which ensure its rectification to recover metric properties.Dans cet article, nous nous intéressons à l'utilisation d'objets plans deans les processus d'autocalibration de caméras. La méthode que nous proposons permet de réaliser l'autocalibration d'une caméra observant un objet planaire inconnu à l'aide de cinq images ou plus de l'objet. Néanmoins, cette méthode vise plutôt des applications de type autocalibration "temps réel vidéo" dans lesquels de très nombreuses images doivent être traitées. La plupart des algorithmes proposés dans la littérature sont assez mal adaptés à ce type de problématiques dans laquelle le volume de données est très important. Pour résoudre ce problème, nous utilisons une approche itérative basée sur le filtre de kalman. Chaque nouvelle image permet le calcul d'une reconstruction projective du plan observé. Cette reconstruction, nous allons le montrer, permet de raffiner la structure métrique de la scène à travers le filtre de Kalman

    Object recognition: solution of the simultaneous pose and correspondence problem

    Get PDF
    The use of hypothesis verification is recurrent in the model-based recognition literature. Verification consists in measuring how many model features transformed by a pose coincide with some image features. When data involved in the computation of the pose are noisy, the pose is inaccurate and difficult to verify, especially when the objects are partially occluded. To address this problem, the noise in image features is modeled by a Gaussian distribution. A probabilistic framework allows the evaluation of the probability of a matching, knowing that the pose belongs to a rectangular volume of the pose space. It involves quadratic programming, if the transformation is affine. This matching probability is used in an algorithm computing the best pose. It consists in a recursive multi resolution exploration of the pose space, discarding outliers in the match data while the search is progressing. Numerous experimental results are described. They consist of 2D and 3D recognition experiments using the proposed algorithm.Nous nous intéressons à la reconnaissance d'objets volumiques par mise en correspondance d'indices visuels. Nous supposons que les objets à reconnaître sont représentés à l'aide de modèles tridimensionnels, composés d'indices visuels. Reconnaître un objet signifie, dans ce cas, mettre en correspondance les indices du modèle de cet objet avec des indices extraits de l'image, de manière à ce que ces derniers puissent s'expliquer comme une transformation géométrique des indices du modèle. La recherche de la pose (valeur des paramètres de la transformation alignant le modèle sur l'image) et la recherche des correspondances sont ici traitées simultanément. Cela constitue l'originalité et la force de la méthode que nous proposons. Nous présentons de nombreux résultats expérimentaux illustrant l'utilisation de notre approche pour la reconnaissance d'objets
    • …
    corecore